-
Notifications
You must be signed in to change notification settings - Fork 23
Test: Test PR to test codeserver for Z #1621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rhoai-3.0
Are you sure you want to change the base?
Test: Test PR to test codeserver for Z #1621
Conversation
Hi @Meghagaur. Thanks for your PR. I'm waiting for a red-hat-data-services member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/build-konflux |
/ok-to-test |
/build-konflux |
@Meghagaur: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/build-konflux |
1 similar comment
/build-konflux |
/build-konflux |
c50cb43
to
f6091a7
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/build-konflux |
1 similar comment
/build-konflux |
aaf09fd
to
001c4f6
Compare
/build-konflux |
65abde5
to
a91156b
Compare
/build-konflux |
…rm64" (opendatahub-io#2574) This reverts commit c8ff00a Originally added in opendatahub-io#1396 because of Pipenv limitations that are no longer present in uv.
…pipeline The error message you provided, `(error: exit status 1; output: write /opt/app-root/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2: no space left on device)`, indicates a **disk space limitation** encountered during the container build process, specifically while writing an Nvidia-related Python package file. This issue, commonly reported as `No space left on device`, generally occurs when the build pipeline attempts to write more data than is available in a shared volume or local ephemeral storage. Here is a detailed analysis and potential solutions based on the source material, particularly those dealing with large container images and multi-platform builds: ### Root Cause and General Solution 1. **Shared Volume Overflow:** The error likely means your build pipeline is consuming too much space in a shared volume, typically the workspace declared in your `PipelineRun` YAML. The default Tekton workspace size is often small (e.g., 1GB). 2. **Solution: Increase Workspace Storage:** The standard recommendation is to **request more disk space** by increasing the storage value within the `.spec.workspaces` section of your relevant `PipelineRun` files: * For example, one user solved a `prefetch-dependencies` task failure by increasing storage to 2Gi. * However, for very large builds (like those involving AI/Nvidia libraries, as your error suggests), you may need significantly more space. One user noted that large images may require **2 to 3 times the actual file size** during building and tagging. ### Context specific to Large/AI/Multi-Arch Builds Your specific error involving `/opt/app-root/lib/python3.12/site-packages/nvidia/nccl/lib/libnccl.so.2` places this failure within the context of building large containers that rely on extensive dependencies, often seen with RHEL AI/AIPCC teams. * **Large Image Size:** Tasks involving machine learning or large model images (sometimes referred to as "modelcar" images) frequently face this issue because the artifacts being built are very large. For instance, certain AI models require ephemeral storage volumes that can exceed 200Gi. * **Aarch64/ARM64 Architecture:** In known instances of this exact type of error (involving unpacking Nvidia/vllm dependencies), the failure consistently occurred on **aarch64/arm64 builds**, while x86_64 builds passed. Default disk size for these nodes may be limited (e.g., 40Gb). * **Failure Point:** The failure you see is occurring during the process of **copying layers and metadata for the container**, likely during the unpacking or committing phase, pointing to the local ephemeral storage running out of space. ### Platform-Specific Workaround Since the issue seems related to the underlying machine size, especially if you are targeting AArch64 (ARM64), a suggested fix is to utilize a remote platform with guaranteed larger disk space: * **Override Platform:** You can try replacing the default build platform in your `PipelineRun` configuration with a larger machine type. For **arm64** builds experiencing this failure, a recommendation was made to use `linux-d160-m2xlarge/arm64`, which provides **160 GB of disk space**. * **Configuring `buildah-remote`:** If you are using a remote build task (like `buildah-remote`), you would need to specify this larger platform flavor in the configuration. If increasing the default workspace size is insufficient, addressing the underlying node size for large builds is crucial.
…age components and new build-platform entries for specific components
…io#2568) * Enabled TrustyAI Notebook for s390x Signed-off-by: Nishan Acharya <[email protected]> * Address Comments Signed-off-by: Nishan Acharya <[email protected]> * Removed EPEL from output stage Signed-off-by: Nishan Acharya <[email protected]> * Add s390x label for konflux Signed-off-by: Nishan Acharya <[email protected]> --------- Signed-off-by: Nishan Acharya <[email protected]>
…io#2575) * s390x changes for codeserver * Update get_code_server_rpm.sh Fix conditional syntax for architecture guard. The missing space before "$ARCH" turns the token into ||"$ARCH", so bash complains with “conditional binary operator expected” and exits before any build logic runs. as suggested by code coderabbitai * Update devel_env_setup.sh changed to do proper && chain for dnf * add files via lfs --------- Co-authored-by: aryabjena <[email protected]>
Sync rhds:main from odh:main
Add BASE_IMAGE as buildArg for the builds configs and remove the aipcc bases as we got unauthorized access
Switch from aippc to ubi/rhel images for rstudio
a91156b
to
9101b77
Compare
Test PR not intented for merge. quick verification of codeserver build on Konflux , checks Dockerfile and platform-specific pylock/pyproject changes.